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ABSTRACT 


We  analyze  the  rate  of  convergence  of  a  class  of  algorithms  based  on  n-dimensional 
interpolation.  In  particular,  we  present  a  class  of  algorithms  which  use  first  order 
information  only,  while  maintaining  quadratic  convergence. 
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1.  Introduction 

Most  of  the  commonly  used  algorithms  for  the  unconstrained  minimisation  of 
f:  Rn  >  R  for  n  >  1,  are  descent  methods.  These  are  based  on  the  Iteration 
xi+^*x1+a^d^  ,  where  d^  e  Rn  Is  a  search  direction,  and  e  R  Is  the  step* 
size,  usually  determined  by  a  line  search  as  a  minlmizer  of  ffr^+ad^)  over  a  >  0 

The  exception  Is  Newton's  method,  which  Is  based  on  Interpolating  f  by  a  quad 
ratlc  and  minimizing  this  quadratic  at  each  step  of  the  algorithm. 

For  a  general  discussion  of  unconstrained  minimization  techniques  see  [1,8]. 

The  following  points  are  relevant  to  our  discussion. 

The  classical  steepest  descent  method  uses  first  order  (l.e.,  gradient)  In¬ 
formation  only.  Its  main  drawback  is  Its  linear  rate  of  convergence. 

Newton's  method  converges  quadratically.  However,  it  necessitates  the  costly 
computation  of  the  Hessian  at  each  Iteration. 

Other  methods  based  on  first  order  Information  are  known  to  converge  super- 
1 inear ly  (e.g.,  [6]). 

Many  of  these  methods  approximate  Newton's  method  in  the  sense  that  the  search 
direction  they  generate  can  be  shown  to  be  a  direction  along  which  an  appropriate 
quadratic  Is  minimized. 

A  different  approach  is  based  on  the  unfounded  assumption  that  algorithms 
having  the  finite  termination  property  (l.e.,  solution  In  a  finite  number  of  steps) 
for  a  class  of  functions  wider  than  the  class  of  quadratics,  are  faster  than  those 
having  the  quadratic  termination  property.  Thus,  Jacobsou  and  Oksman  [7]  generalize 
from  quadratic  termination  to  homogeneous  functions  termination.  This  was  further 
generalized  (see  e.g.,  [4]). 

Another  step  twoard  discarding  the  quadratic  model  has  recently  been  taken 


by  Davldon  [5].  His  motivation,  however,  partly  coincides  with  ours. 


2. 


In  this  paper,  we  analyze  the  rate  of  convergence  of  n-dimenaional  interpolation 
algorithms  for  unconstrained  minimisation.  We  note  the  following. 

Most  of  the  commonly  used  algorithms  for  one-dimens lonal  minimisation  are  based 
on  polynomial  Interpolation.  It  is  well  known  that  Newton's  method  in  this  case  is 
inefficient  in  the  sense  that  quadratic  convergence  can  be  achieved  using  first  order 
information  only  (e.g.,  [8,  p.  142]),  by  using  two  interpolation  points  rather  than 
one.  This  should  make  one  doubt  whether  Newton's  method  is  a  suitable  model  for 
efficient  algorithms. 

Quadratics  are  inadequate  for  n-dimensional  two-point  interpolation  with  zero 
and  first  order  Information  (see  Davidon  [5]).  Therefore,  non-polynomial  interpolation 
is  necessary.  Our  analysis  shows  that  the  rate  of  convergence  is  independent  of  the 
interpolating  function.  For  interpolatory  algorithms,  therefore,  the  question  whether 
the  search  direction  coincides  with  Newton's  direction  or  generalizes  it,  is  irrele¬ 
vant  to  the  rate  of  convergence  analysis.  The  same  is  true  for  termination  properties. 

The  main  difficulty  in  the  analysis  of  n-dimensional  interpolation  algorithms  is 
that  the  formulas  for  the  error  in  n-dimensional  interpolation  are  not  suitable  for 
this  purpose.  We  overcome  this  difficulty  by  reducing  the  n-dimensional  problem  to 
an  appropriate  one-dimensional  interpolation  problem. 

2.  Minimization  by  Interpolation 

The  Interpolation  algorithm  we  study  generates  a  sequence  (x^)  as  follows. 

Let  s  >  1,  m  >  0  be  fixed  Integers.  Given  m  +  1  approximate  Xq . x^^  to 

the  solution  of 

Vf(x*)  -  0  , 


(1) 


3. 


we  use  x,  ,  x,  . . x,  to  construct  a  new  approximant 

l  l-l  i-m 

late  £  by  T  requiring 


x^  .  First  we  interpo- 


(2) 


j  *  0f  •  •  .tm;  k*  0f  •  •  »9  8— 1 


Here  f ^  =Vf,  f ^  =V^f  etc.,  and  T:  Rn  •*>  R  is  assumed  to  depend  on  some  param¬ 
eters  to  be  determined  by  (2) .  The  new  point  is  determined  by 


(3) 


VT(x1+1)  -  0  . 


In  the  following,  we  assume  that  equations  (l)-(3)  have  solutions. 

We  define  the  rate  (or  order)  of  convergence  of  a  sequence  {x^}  converging 

ijp 

to  x  as  the  number  p  (if  it  exists)  such  that 


iixi+rx 


|)x1-x 


>  c  4  0  . 


Here  ||*||  is  a  fixed  arbitrary  norm.  Ortega  and  Rheinboldt  [9,  19]  refer  to  the 
rate  p  defined  above  as  the  C-order  of  the  sequence  (x^) .  When  it  exists,  it 
coincides  with  their  Q-  and  R-orders.  We  will  unify  our  results  for  the  C-,  Q- 
and  R-orders  through  the  use  of  the  C-order  of  convergence. 

We  derive  the  rate  of  convergence  of  the  n-dimensional  interpolating  algorithm 

•fa 

by  establishing  some  difference  relations  for  the  errors  ||xj-x  ||.  To  derive  the 

n  ^ 

basic  difference  relation  we  need,  we  pass  a  curve  in  R  through  the  points  x 

and  x. ,  x,  , . . .,  x.  ,  i.e.,  we  determine  a  function  'I':  R  *  Rn  such  that 
i+i  l  l-m  • 


(A) 


*(ti-J>  "  *i-J  i--1'0'1 . 

*(t*)  -  x*  , 


where  the  parameter  t  is  chosen  so  that 


4. 


(5) 


'i-j 


We  will  later  discuss  this  construction.  Note,  however,  that  the  construction 
of  ^  is  a  part  of  the  analysis  of  the  properties  of  the  algorithm,  not  a  part  of 
the  algorithm  itself. 

Henceforth  we  will  assume  0  4  tj_ j  4  for  a^-  *■  at»d  j,k  = -1, 0, . . .  ,m. 

This  is  a  natural  assumption.  If  t^s0  for  some  i,  the  algorithm  terminates, 
while  the  assumption  9*  t£-k  J»k *  "1* • • *»®  has  to  be  made  even  in  the  one-di¬ 

mensional  case  (cf.  Traub  [14,  Ch.  4]). 

New  define  9(t)  =  T0i»(t)),4(t)  *  f OKt)) .  Note  that  depend  on  i. 

Following  Traub  [14]  and  Tamir  [12,13],  we  will  not  make  this  dependence  explicit 
in  order  to  simplify  notation.  Equations  (2)  and  (4)  imply 

(6)  9  ^  (t^j)  *  4  ^(ti_j)  t  ]  =  0, .  •  .,n;  k*0,  ...,s-l  . 


It  follows  that  (6)  defines  one-dimensional  interpolation  for  which  a  convenient 
error  formula  exists  (see  Ostrowski  [10,  p.  12]).  Henceforth,  we  assume  9,4  e  C 
in  a  neighborhood  of  t  ■  0,  where  r  *  s (m  + 1) .  Using  the  one-dimensional  error 
formula  we  have 


(7) 


4<t)  -9(t) 


4(r,).^)-elr)Ili 

r! 


m 

n  (t-t 

j-0 


s 


* 


where  £  is  a  point  in  the  interval  determined  by  t,  t^  , . . . ,  t^_m  .  Note  that 
formula  (7)  holds  for  general  (not  necessarily  polynomial)  interpolation. 

We  now  differentiate  (7)  and  set  t  ■  0.  From  (1)  and  (2)  we  have 
4*  (0)  -  9*  (t1+1)  -  0,  so  that  4'  (0)  -  9'  (0)  *  -9’  (0)  -  9'  (t1+1)  -  9'  (0)  -  ti+19"  (C), 
where  £  Is  a  point  between  t^  and  0.  Differentiating  the  right  hand  side  of 
(7)  using  Ralston's  result  [11]  on  the  differentiation  of  the  error  term  generalized 
for  the  hyperosculatory  case  (see  [2]),  we  finally  have 
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Lemma  1.  Under  Che  assumptions  made  above,  Che  errors  In  Che  n -dimensional  interpo¬ 
lation  algorithm  satisfy  the  difference  relation 

m 


(8) 


-  =  m.  E  t®“J  n  t®.  +  s,  nt!  ,  , 

1+1  1  k*0  i-k  j-0  1  j-0  W 


where 


M(li(ti+l))(“1)r1'® 

Mi  =  *  Ni  "  ®p,^(ti+l)) 


N(T)i(ti+l)),("l)r 


M(t) 


<i(r)(t)-Q(r)(t) 

r! 


N(t) 


<5+l)T 


and  where  1^(0,  ^(t)  are  in  the  interval  determined  by  t,  *£+£»•••»  t^^ , 
and  £(t^+^)  is  in  the  interval  determined  by  t^+^  ,  0.  CD 

It  follows  from  (8)  that  if  the  initial  errors  tQ . t^  are  small  enough, 

and  if  the  coefficients  {Mi),  }  are  bounded,  the  sequence  £ t^}  converges  to 

•fc 

zero,  i.e.,  x^  x  .  Moreover,  if  s  >  2,  (8)  implies 


(9) 


i+1 


*  0  , 


(i.e.,  superlinear  convergence).  If  s  - 1,  we  assume  m  >  2.  For  m*2,  (8)  is 

Che  basic  difference  relation  governing  the  behavior  of  the  Quadratic  Fit  algorithm, 
which  is  known  to  converge  superlinear ly  (see  Theorem  3.4.1  in  Brent  [3]).  'It  is 
evident  from  (8)  that  the  rate  for  m  >  2  is  not  less  than  Che  rate  for  m-2. 
Therefore,  (9)  holds  for  all  s  >  1,  m  >  0  if  r»s(m  +  l)  >  3.  Rewriting  (8)  in 
the  form 


vi  ■  v* 1  'i!k + 'i1 


N. 


we  finally  have 
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Lenina  2.  Under  the  assumptions  made  above,  and  if  +  M  4  0,  the  sequence  t^ 
satisfies  the  difference  relation 


(10) 


1+1 


A  t*'1 
1+1  1 


m 

n  t 

J-l 


s 

t-j 


with  A  +  M.  CD 

Defining  y^  =  log|tjJ  and  B^=log|A^|,  (10)  Implies  the  difference  equation 


Yi+1  "  (8‘1)yi 


m 


j-i 


i-j 


=  B 


1+1 


with  indicial  equation 


(ID 


tm+l 


(8-1) tm 


m-1 
s  E 
j“0 


t* 


0  , 


where  the  sum  in  (11)  Is  taken  as  zero  if  m  =  0. 

Tamir  [12,13]  proves  that  under  our  assumptions  the  C-rate  of  convergence  of 
the  sequence  ( t^]  (hence  (x^))  is  given  by  the  unique  positive  root  of  the  in¬ 
dicial  equation  (11).  In  this  case,  the  C-,  Q-,  and  R-rates  of  convergence  are 
exactly  p,  where  p  is  the  positive  solution  of  (11). 

If  the  limit  of  exists  and  is  zero,  or  if  this  limit  does  not  exist,  but 

the  sequences  {M^),  {N^}  are  bounded,  equation  (8)  can  be  rewritten  in  the  form 


t 


i+1 


n  t 
j-l 


i-J 


{m  .  |1+  E 

1  k-l 


» 


which  implies  that  the  Q-  and  R-rates  of  convergence  are  still  at  least  p. 


7. 


We  now  summarize  our  results: 

Theorem  1.  If  equations  (l)-(4)  have  solutions,  the  functions  T,  f,>|r  e 
if  the  sequences  (M^)  and  (N^)  are  bounded,  and  if  the  initial  errors  of 
the  interpolation  algorithm  are  small  enough,  then  the  sequence  {x^j  converges 
to  the  solution  x  with  C-  (when  it  exists),  Q-  and  R-rates  of  convergence  at 

least  p,  where  p  is  the  unique  positive  solution  of  (11). 

Corollary  1.  The  rate  of  convergence  of  the  sequence  generated  by  the  interpo¬ 
lation  algorithm  is  independent  of  the  interpolating  function. 

The  reader  should  note  that  while  the  curve  ijr(t)  may  be  constructed  in  in¬ 
finitely  many  ways,  it  is  sufficient  to  establish  the  existence  of  just  one  such 
curve  (for  each  i) .  We  now  turn  our  attention  to  this  problem. 


Lemma  3.  If  tQ  =  0  and  t^tj  for  i,  j*0,l . k  the  determinant 


2  3 

1  *0  C0 


2  3 

1  *1  C1 


k+1 


k+1 


*i  -3 


1  c'  ck 


r 


does  not  vanish. 

Proof.  Since  t^  =  0  we  have 


8. 


t?  t?  ...  tf1 

l  t.  ...  t^1 

11  1 

4  4  -  fc2+1 

l  l 

1  fc2  *"  t2  1 

n  2 

=  n  t 

t?  t i  ...  t^+i 

i-i  1 

1  t.  ...  t*"1 

k  k  k 

k  k 

The  last  determinant  is  a  Vandermonde  determinant,  and  since  t^  4  t ,  i,  j  =  l, 
it  does  not  vanish.  O 

k+1 

Lemma  4.  Let  p(t)  =  at+  E  a.  t  ,  and  let  tn  =  0,  t.  ^  t  for  i,  j  =  0, 1,  . . .  ,k. 

i=0  1  J 

ii*l 

Then  the  system  of  equations 

p(tj)  =  j-0,...,k  , 

for  the  unknowns  a^  i *  0, 2, 3, . . . ,k  +  1  has  a  solution  for  all  a,  . 

Proof.  This  is  an  immediate  consequence  of  Lemma  3.  C3 

Since  p'(0)=a  and  p(t)  is  a  polynomial,  taking  k  =  m  +  2  in  Lemma  4  we 

have 

Lemma  5.  If  the  errors  satisfy  0^  *"i-j  ^  **£— 1c  ^or  *  stk1  =  "1* °»  *  •  •  >m» 

there  exists  a  curve  ♦  e  C*  satisfying  (4).  Moreover,  we  may  require  'i'(O)  =  a 
with  a  e  Rn  arbitrary.  D 

Ve  can  now  state  our  main  result. 

Theorem  2.  Assume  that  equations  (l)-(4)  have  solutions,  f  has  continuous  deriv¬ 
atives  of  order  r  +  l,  the  parameters  of  T  depend  continuously  on  the  data  through 
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(2),  T  has  continuous  derivatives  of  order  r  +  1  for  the  appropriate  values  of 
the  parameters.  Assume  also  that  V2f(x*)!*0  and  0  4  4  t^^  for  all  i  and 

j,k  = -1,0, . . .  ,m,  then  if  the  initial  errors  of  the  interpolation  algorithm  are 
small  enough,  the  sequence  (x^}  converges  to  the  solution  x  with  C-  (when  it 
exists)  Q-  and  R-rates  of  convergence  at  least  p,  where  p  is  the  unique  positive 
solution  of  (11). 

Proof.  By  the  above  it  is  sufficient  to  show  that  under  the  assumptions  of 

the  theorem,  the  sequences  (M^,  (N^}  are  bounded.  This  is  the  case  if 
«  T  2  k  • 

^(0)  V  f(x  )<K0) 4  0,  for  which  it  is  sufficient  to  choose  the  vector  a  in  Lemma  5 
as  an  eigenvector  of  V  f(x  )  corresponding  to  a  nonzero  eigenvalue.  *— * 

Summary  and  Conclusion 

We  have  shown  that  the  rate  of  convergence  of  n-dlmensional  interpolation  al¬ 
gorithms  is  inherited  from  the  underlying  one-dimensional  interpolation,  that  it  is 
independent  of  the  interpolating  functions,  and  is  given  by  the  unique  solution  of 
the  equation 

(12)  t”*^-  -  (s-l)tm  -s  £  t^  «*  0  , 

j=0 

where  m  +  1  interpolation  points  and  s  derivatives  (of  orders  zero  to  s-1) 
are  used. 

Our  work  is  based  on  the  results  of  Traub  [14]  and  Ostrowski  [10]  for  the  one- 
dimensional  root-finding  problem.  Tamir  [12,13]  adapted  these  results  for  the  mini¬ 
mization  problem.  In  [12]  he  studies  the  rate  of  convergence  of  algorithms  using 
function  values  only  (ra»0)  with  a  superfluous  assumption  and  a  false  conjecture. 
This  detailed  analysis  is  repeated  in  [13]  for  the  case  m  >  0.  He  treats  polynomial 
interpolation  only,  and  shows  that  for  fixed  s  and  m  «, 


the  rate  p  tends  to 


However,  he  neglects  to  realize  the  effect  of  memory  on  the  rate  of  convergence, 
which  Is  implied  by  (11)  and  (12). 

Indeed,  for  fixed  s,  the  rate  Is  obtained  for  m  *  0  and  m  *  1  by  solving 

2 

the  indlcial  equations  t  -  (s-1)  =  0  and  t  -(s-l)t-s  =  0,  respectively.  There- 

8  /  s  2 

fore,  p  =  s  - 1  for  m=0,  p  *  8  for  m  =  1  and  p*-— +y(j)  +  I  f°r  m*"“*  It 
follows  that  algorithms  using  more  than  two  Interpolation  points  are  inefficient, 
and  two-point  algorithms  are  substantially  faster  than  one-point  algorithms. 

In  particular  for  m  =  1  and  s  »  2  we  have  a  two-point  algorithm  using  first- 
order  Information  with  second -order  rate  of  convergence  (which  is  a  well-known  re¬ 
sult  in  the  one -dimensional  case). 

Note  that  no  line  search  is  needed  in  this  class  of  algorithms,  and  they  they 
may  be  designed  to  locate  saddle  points  rather  than  minimum  points.  A  line  search, 
however,  may  serve  as  part  of  a  globalizing  procedure. 

Compare  also  the  discussion  in  Davidon  [5]  regarding  the  difficulty  of  determ¬ 
ining  the  effect  of  memory  on  the  performance  of  descent  algorithms . 

We  have  not  computed  the  asymptotic  error  constant,  since  it  depends  on  the 
norm  used  (see  Ortega  and  Rheinboldt  [9]).  This  can  be  computed,  however,  under 
the  appropriate  assumptions  (cf.  Tamir  [12,13]).  We  have  also  made  no  attempt  at 
giving  the  strongest  results  (i.e.,  the  weakest  assumptions)  possible.  Compare,  for 
example,  Brent  [3]. 

Finally,  note  that  Theorem  1  holds  for  infinite  dimensional  spaces,  and  that 
our  analysis  is  applicable  with  the  obvious  modifications  to  the  solution  of  systems 
of  equations,  for  which  the  indiclal  equation  analogous  to  (11)  is 
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